Acta Psychiatrica Scandinavica — Latest Matching Preprints

1

The VOICE-DEP study protocol: multimodal analysis of voice and discourse during medical interviews to support diagnosis and longitudinal monitoring of major depressive disorder

Zabalza-Zudaire, M.; Sayar-Beristain, O.; Fructos, P.; Nunez, F. E.; Carpio, F. F.; Garcia, E.; Ortiz, A.; Ortuno, F.; Aldaz, A.; Molero, P.

2026-07-04 psychiatry and clinical psychology 10.64898/2026.07.01.26357063 medRxiv

Top 0.1%

4.9%

Show abstract

Background: Major depressive disorder is a severe, recurrent and disabling condition. Although diagnosis and clinical monitoring are based on medical interviews and validated rating scales, speech and discourse analysis may provide complementary digital biomarkers reflecting depressive severity and clinical evolution. However, current evidence remains limited by methodological heterogeneity, predominantly cross-sectional designs, limited longitudinal data and underrepresentation of non-English-speaking clinical populations. Objective: The aim of the VOICE-DEP study is to develop and formalize a standardized, reproducible and clinically grounded protocol for the multimodal analysis of voice and discourse during medical interviews as a tool to support the diagnosis of depressive disorder and to assess whether speech-derived biomarkers change over time in parallel with clinical severity measures. Methods: VOICE-DEP is an observational, prospective, longitudinal pilot study of patients with major depressive disorder with a healthy control group, conducted in a hospital-based clinical setting in Spain. The study will include 25 adult patients with moderate or severe unipolar depression, with or without psychotic symptoms, and 50 healthy controls without a personal history of psychiatric disorders. Patients will be assessed at five time points: baseline (V0) and four monthly follow-up visits at 30, 60, 90 and 120 days. Healthy controls will be assessed once at baseline. The planned dataset comprises 175 voice recordings: 125 from patients and 50 from controls. At each assessment, the Montgomery-Asberg Depression Rating Scale related part of the medical interview, lasting approximately 10-30 minutes and including an initial free-speech segment, will be recorded using a standardized audio protocol. Acoustic, paralinguistic and linguistic features will be extracted and analyzed in relation to clinician-rated severity measures and self-reported symptoms. Ethics: This protocol has been reviewed and approved by the local Research Ethics Committee, which complies with the international standards of GCP CPMP/ICH/135/95 (Comunidad Foral de Navarra Research Ethics Committee; reference code: 2026.110). Written informed consent will be obtained from all participants before any study procedure. Voice recordings and clinical data will be pseudonymized, stored securely and processed in accordance with applicable Spanish and European data protection regulations. Expected outcomes: This protocol is expected to generate a clinically grounded Spanish-language longitudinal speech corpus and a transparent analytical framework for evaluating voice- and discourse-derived biomarkers as complementary tools for depression assessment and monitoring

2

A Local Outpatient Practice-Level Prediction Model for Short-Term Psychiatric Emergency Presentation

Havlik, J. L.; Tyrrell, B.; Bell, N.; Polaschek, J.; Arzubi, E. R.

2026-07-01 psychiatry and clinical psychology 10.64898/2026.06.29.26356785 medRxiv

Top 0.1%

4.2%

Show abstract

Importance: Psychiatric emergency department (ED) presentations are difficult to predict using general medical risk stratification tools. Health information exchange (HIE) data may improve prediction by capturing fragmented care across settings. Objective: To develop and temporally validate a machine learning model using HIE and geospatial data to predict 30-day psychiatric ED presentation among outpatients receiving psychiatric care and to compare its performance with standard clinical risk scores. Design, Setting, and Participants: This retrospective cohort study included patients seen at Frontier Psychiatry with records in the Big Sky Care Connect statewide HIE. Structured clinical data were linked to zip code-level sociodemographic measures. The analytic unit was the patient snapshot, defined as all structured data available up to a given point. Models were evaluated in temporally separated train and test sets. Exposures: Predictors derived from HIE structured data, including prior utilization, diagnoses, medications, laboratory data, and zip code-linked geospatial deprivation and vulnerability measures. Main Outcomes and Measures: The primary outcome was psychiatric ED presentation within 30 days, identified from structured encounter-type fields and primary diagnosis codes for psychiatric or substance use disorders. Model discrimination was compared with a parsimonious clinical baseline model and LACE and Elixhauser scores. Results: In the test set, 343 of 16,469 snapshots (2.1%) were followed by a qualifying psychiatric ED presentation within 30 days, corresponding to 102 ED visits among 68 patients. The machine learning model showed discrimination in temporally held-out testing and outperformed the clinical baseline model as well as LACE and Elixhauser scores. At a prespecified decision threshold, the model reduced the number needed to evaluate from more than 40 with universal screening to 3.4 to identify 1 true-positive case, while identifying over two fifths of 30-day psychiatric ED presentations. Conclusions and Relevance: In this retrospective cohort study, a locally developed machine learning model using statewide HIE data showed improved prediction of 30-day psychiatric ED presentation compared with selected general-purpose risk scores. The results support the feasibility of HIE-enabled local psychiatric risk modeling and suggest other practices could develop similarly tailored models. Prospective studies are needed to assess clinical utility and effects on outcomes.

3

Prompt Engineering Limitations: Preliminary Evaluation of Large Language Models for Psychotherapy Safety

Ngo, N.; Dao, G.; Sano, A.

2026-07-18 psychiatry and clinical psychology 10.64898/2026.07.16.26358261 medRxiv

Top 0.1%

3.1%

Show abstract

Large Language Models are increasingly used in consumer-facing mental health tools, many of which claim that prompt engineering alone can ensure safe therapeutic behavior. This study evaluates that assumption by testing 20 proprietary and open-source LLMs on high-risk psychiatric scenarios, using prompts grounded in behavioral therapy principles. Prompt engineering reduced some predictable risks, such as explicit endorsement of self-harm, but consistently failed in ambiguous or clinically nuanced situations. Models frequently validated harmful statements, colluded with hallucinations, minimized symptoms, or used stigmatizing language, including in the newest and largest models. These failures reflect structural limitations such as lack of memory, insufficient contextual reasoning, and training-related biases. Prompt engineering alone is therefore insufficient for safe AI-mediated psychotherapy; clinician-guided fine-tuning, integrated safety mechanisms, and system-level oversight will be required. This work provides early evidence motivating deeper clinician-led evaluation and safety-oriented model development.

4

Violent offending in severe mental illness: the role of psychiatric comorbidity and crime type - insights from the first nationwide Norwegian registry linkage

Tesli, M.; Fazel, S.; Hauge, L. J.; Tesli, N.; Nerland, S.; Stavseth, M. R.; Bukten, A.; Ziaka, L.; Heilskov, E. R.; Haukvik, U. K.; Reneflot, A.; Skardhamar, T.; Friestad, C.; Rokicki, J.

2026-07-14 psychiatry and clinical psychology 10.64898/2026.07.10.26357737 medRxiv

Top 0.1%

2.3%

Show abstract

Background Individuals with severe mental illness (SMI), including schizophrenia spectrum disorders (SSD) and bipolar disorder (BD), have been shown to have an elevated risk of violent perpetration. However, no population-wide study has systematically examined how this risk varies across psychiatric comorbidity patterns and specific violent crime types. Methods Using the first nationwide Norwegian registry linkage comprising mental health and crime data, we included 3,612,215 individuals aged 15-79 years living in Norway on Jan 1, 2008, and followed them until Dec 31, 2022. We estimated absolute and relative risks (RRs) of violent offending overall and by specific violent crimes among individuals with SSD and BD. To capture clinically relevant comorbidity patterns, we included substance use disorders (SUD), common personality disorders (PD), and hyperkinetic disorders (ADHD). RR models were adjusted first for sex and age, and subsequently for co-occurring mental disorders. Findings At the population level, individuals with SMI accounted for a minority of violent offenders (SSD: 8.7%; BD: 4.6%), whereas SUD was present among a substantially larger proportion (36.8%). Absolute risk of violent offending increased markedly with psychiatric comorbidity, from e.g., 5.0% among individuals with SSD alone to 43.9% for SSD combined with SUD and PD. Compared with the remaining general population, the RR of violent offending for SSD decreased from 6.58 (95% CI 6.4-6.8, adjusted for sex and age), to 2.0 (2.0-2.1) after further adjustment for other mental disorders. Similar attenuation patterns were observed across specific violent crime types, although varying in magnitude. In contrast to SMI, elevated risks associated with SUD remained substantial after full adjustment across most crime categories. Interpretation The association between SMI and violent offending is strongly influenced by psychiatric comorbidity, particularly SUD, and varies across crime types. Our findings underscore the importance of identifying and treating co-occurring mental disorders and substance use, both in the clinical management of SMI and in population-level violence prevention strategies.

5

Evidence-guided AI regularization for suicidal ideation prediction in pediatric bipolar disorder

Jabbar Abdl Sattar Hamoudi, H.; Wu, M.-J.; Sanches, M.; Zunta-Soares, G. B.; Soutullo, C. A.; Soares, J. C.; Mwangi, B.

2026-06-22 psychiatry and clinical psychology 10.64898/2026.06.18.26355841 medRxiv

Top 0.1%

2.1%

Show abstract

Background: Suicide prediction models in psychiatry often rely on purely data-driven feature selection, which can produce unstable and clinically opaque predictor sets in modest-sized samples. We developed Evidence-Based AI LASSO (EBAL), an evidence-guided regularization framework that incorporates curated clinical evidence into feature-specific penalty factors for interpretable prediction. Methods: Baseline data from 136 youth with confirmed bipolar spectrum disorder in the Greater Houston Area Bipolar Registry were analyzed using 20 candidate clinical predictors. Forty higher-level evidence documents on suicidality and related predictor domains were curated through a structured evidence synthesis workflow and indexed as an auditable evidence corpus. An open-weight large language model assigned feature-specific penalty factors using a prespecified scoring rubric, and these penalties were used to fit a weighted LASSO model. EBAL was compared with a standard evidence-agnostic LASSO using nested leave-one-out cross-validation. Results: For suicidal ideation, EBAL achieved an AUROC of 0.768, balanced accuracy of 0.757, sensitivity of 0.758, and specificity of 0.757. The standard LASSO achieved an AUROC of 0.760 and balanced accuracy of 0.715. EBAL improved balanced accuracy (+0.042, p=0.010) and Matthews correlation coefficient (+0.079, p=0.010), while retaining fewer stable predictors than standard LASSO (11/20 vs 18/20). The strongest positive predictors were current depressed mood, duration of mood disorder illness, and comorbid generalized anxiety disorder. For suicidal behavior, both models performed near chance and retained all candidate predictors. Limitations: The study was cross-sectional, single-site, and modest in sample size, with no external validation cohort. Conclusions: EBAL produced a sparser and more clinically coherent model for suicidal ideation in pediatric bipolar disorder, but did not improve prediction of suicidal behavior. These findings support evidence-guided regularization as a transparent strategy for aligning psychiatric prediction models with prior clinical knowledge while preserving interpretability.

6

Antidepressant Maintenance Versus Active Monitoring After Depression Remission: A Decision Analysis Stratified by Relapse Risk and Patient Preferences

Meyerson, W. U.; Cai, T.; Smoller, J. W.

2026-07-20 psychiatry and clinical psychology 10.64898/2026.07.17.26358340 medRxiv

Top 0.1%

1.7%

Show abstract

Importance: Patients who achieve remission from major depressive disorder (MDD) often face a preference-sensitive decision between continued antidepressant maintenance and discontinuation with active monitoring. Quantifying the tradeoff between depression burden and long-term medication exposure may support more individualized shared decision-making. Objective: To quantify tradeoffs between continuous antidepressant maintenance and active monitoring after MDD remission, and to identify preference thresholds favoring each strategy across relapse-risk strata. Design: Individual-level decision-analytic health-state transition model calibrated to randomized maintenance-discontinuation trials and a longitudinal first depressive episode cohort, with a 5-year time horizon. Setting: Outpatient clinical decision after completion of an 8-month continuation phase following remission from MDD. Participants: Adults in remission from MDD, represented across 4 clinically anchored relapse-risk strata ranging from very low risk after a first mild episode to high risk after highly recurrent depression. Exposures: Continuous antidepressant maintenance vs discontinuation with active monitoring and antidepressant restart after detected relapse. Main Outcomes and Measures: Severity-weighted depression-months, antidepressant medication-years, medication-years per depression-month averted, and net benefit across preference thresholds defined as the maximum additional medication-years a patient would be willing to accept to avert 1 depression-month. Results: Continuous maintenance reduced depression burden but required substantially more medication exposure, with efficiency strongly dependent on relapse risk. Medication-years per depression-month averted ranged from 11.8 (95% uncertainty interval [UI], 7.8-19.6) in the very low-risk group to 1.5 (95% UI, 0.8-3.0) in the high-risk group. At a preference threshold of 3 medication-years per depression-month averted, maintenance was preferred for moderate- and high-risk patients; at a threshold of 2, only for high-risk patients; and at a threshold of 1, for no risk group. Conclusions and Relevance: In this decision-analytic model, the value of continuous antidepressant maintenance depended strongly on baseline relapse risk and patient preferences regarding long-term medication exposure. These findings provide a quantitative framework for shared decision-making about antidepressant maintenance after remission from MDD.

7

Conversational trajectory degrades large language model detection of suicidal ideation relative to clinicians: a preregistered study

Kalinich, M.; Luccarelli, J.; Santa Maria, J.; Flathers, M.; Nguyen, A.; Song, S. H.; Makhoul, K.; Rivera Criado, M. J.; Ginapp, C. M.; Hill, B.; Shumate, J. N.; Notsu, H.; Smith, C.; Moss, F.; Torous, J.

2026-07-14 psychiatry and clinical psychology 10.64898/2026.07.10.26357132 medRxiv

Top 0.1%

1.7%

Show abstract

Background General-purpose large language models increasingly encounter emotional and therapy-like conversation, yet are not developed or evaluated as clinical systems. Existing safety evaluations rely largely on brief exchanges, although harms often unfold over extended interactions. Whether models maintain safety-relevant performance as conversations accumulate context remains unknown. Methods In this preregistered study, 400 clinician-validated statements, with or without suicidal ideation, were inserted at 0-200 speaker turns in 5 psychotherapy and 3 synthetic transcripts. Forty-nine LLMs and 8 clinicians performed the same binary classification task. Mixed-effects models estimated the effects of conversational depth, model scale, and model version on F1. Twelve top models were tested to 1,500 turns across conversational trajectories, with or without instruction restatement. Results F1 declined with depth across model families (p<0.001). Larger, newer models performed better but still degraded. Clinicians showed no decline (mean F1 0.86 at both 0 and 200 turns), but eight of nine proprietary models exceeded their performance at 200 turns. Conversational content, not length alone, explained F1 changes; the largest decrease was under adversarial context (p<0.001). Restating instructions increased F1 on therapy to near baseline (median {Delta}F1 +0.12; p<0.001; 89% median recovery) versus MSJ ({Delta}F1 +0.08; p=0.04; 38% recovery). Conclusions LLM detection of suicidal ideation degraded with conversational depth and trajectory, whereas clinician performance remained stable despite the strongest models exceeding most clinicians in absolute performance. Mental health AI safety evaluations should test sustained performance across realistic and adversarial trajectories rather than relying on short-prompt benchmarks.

8

The Shape of a Final Message: An Emotional Landscape in the Language of Suicide

Pestian, J. P.; Jacobson, D. A.; Pedapati, E. V.; Mendonca, E. A.; McMahon, B. H.; Ive, J.; Glauser, T. A.

2026-07-17 psychiatry and clinical psychology 10.64898/2026.07.16.26358230 medRxiv

Top 0.1%

1.7%

Show abstract

The emotional content of suicide notes is typically examined using categorical coding, where each labeled passage is treated in isolation from its surrounding language. In contrast, dimensional models of psychopathology propose that affective content varies along continuous gradients. We evaluated this proposition directly. Excerpts from 884 annotated suicide notes were embedded in a semantic space defined solely by their linguistic properties, and we investigated whether human-assigned emotion labels changed smoothly across this space. They did: affective tone showed clear spatial autocorrelation (Moran's $I = 0.18$, $z = 19.68$, $p < 0.001$), an effect that replicated across three different encoders and remained after removing all within-note dependencies. Emotions occupied recognizable yet overlapping regions rather than forming distinct clusters and varied substantially in how tightly they were concentrated: love and hopelessness appeared with similar frequency, but love was far more localized ($z = 15.7$ versus $10.8$). Among all emotions, hopelessness was the most linguistically diffuse, implying that a single categorical label is capturing multiple, qualitatively different manifestations of suicidal distress.

9

Reconsidering the case against risk prediction in self-harm: routinely collected health data distinguishes groups at higher and lower risk of adverse outcomes following paracetamol overdose

Oxley, J.; Schölin, L.; Brennan, G.; Anand, A.; Brett, J.; Eddleston, M.; Humphries, C.

2026-07-17 psychiatry and clinical psychology 10.64898/2026.07.15.26358127 medRxiv

Top 0.1%

1.6%

Show abstract

Background. UK clinical guidance recommends that structured risk prediction tools and risk stratification should not be used in self-harm, to predict suicide or determine who is offered treatment. Underpinning this position is the premise that routinely collected health data contain no useful predictive signal, which has received little direct scrutiny. Objective. To test whether routinely collected electronic health record data can distinguish groups at higher and lower risk of severe outcomes following paracetamol overdose. Methods. We analysed 4,095 adults presenting to NHS Lothian emergency departments with paracetamol overdose (2017-2023). Elastic-net logistic regression was fitted to 37 routinely collected electronic health record features to predict a composite of death or mental health inpatient admission at 0-7, 8-30 and 31-365 days following attendance, evaluated on a held-out 20% test set with bootstrapping. Findings. Events occurred in 5.5% of patients at 0-7 days, 2.0% at 8-30 days and 7.9% at 31-365 days, dominated by mental health admission. Bootstrap AUROC 95% confidence intervals lay above 0.5 in every window (0.65-0.82, 0.63-0.90, 0.71-0.85): models ranked patients better than chance. Calibration slopes (1.04, 1.14, 1.07) were close to one. Ranking drew primarily on mental health-related features. Conclusions. Routinely collected health data carried predictive signal for severe outcomes after paracetamol overdose, although discrimination fell short of what is needed for individual-level clinical use. Clinical implications. These models are not proposed for clinical deployment; however, treating risk prediction as a settled question will redirect research efforts, potentially excluding this patient population from machine learning advances driving improvements in care in other medical specialties.

10

Risk factors for suicide and repeat self-harm: a cohort study of adults with hospital-presenting self-harm

Flygare, O.; Bjureberg, J.; Wallert, J.; Doering, S.; Salander Renberg, E.; Waern, M.; Runeson, B.

2026-06-24 psychiatry and clinical psychology 10.64898/2026.06.15.26355458 medRxiv

Top 0.1%

1.6%

Show abstract

Background:Previous self-harm elevates the risk of repeat self-harm and suicide, but the prognostic value of events and clinician observations around the index event is unclear. We evaluated established and exploratory risk factors for suicide and repeat self-harm among patients presenting to emergency psychiatric units after a suicide attempt or nonsuicidal self-injury (NSSI). Methods: Multicentre cohort study in Sweden (n = 804). Outcomes were suicide and repeat self-harm at 1-year and 5-year follow-up, ascertained through linked national registers. Established risk factors included psychiatric diagnoses, prior suicidal behaviour, and sociodemographic characteristics; exploratory factors comprised past-week self-reported symptom changes and clinician observations. LASSO-regularised Cox regression models were fitted for established (n=21) and exploratory (n=11) risk factors. Results: During five-year follow-up, 285 (35%) individuals had a new episode of self-harm and 41 (5%) died by suicide. No risk factors reached statistical significance for suicide, although male sex was retained after regularisation (1-year hazard ratio [HR] = 3.57 [95% CI 0-8.33]; 5-year HR = 2.5 [0.03-4.55]). Three established risk factors were significantly associated with repeat self-harm: psychiatric inpatient care in the three months before the index event (1-year HR = 1.85 [1.3-2.6]; 5-year HR = 1.72 [1.23-2.65]), previous suicide attempt (1-year HR = 2.01 [0.79-2.4]; 5-year HR = 2.19 [1.27-2.6]), and borderline personality disorder (1-year HR = 1.82 [1.13-3]; 5-year HR = 1.67 [0.14-2.75]). Among exploratory risk factors, clinician-observed hopelessness (1-year HR = 1.72 [1.1-2.3]; 5-year HR = 1.51 [1.03-1.91]) and personality disorder features (1-year HR = 1.48 [0.96-2.05]; 5-year HR = 1.47 [1.04-1.95]) were associated with repeat self-harm. Conclusions: Risk factor profiles for repeat self-harm were consistent at 1 and 5 years. Beyond established risk factors, clinician-observed hopelessness and personality disorder features emerged as markers of risk, suggesting that qualitative clinician assessments may yield prognostic information not available from medical records alone.

11

Structure-Informed Cognitive Representation Improves Prediction of Real-World Functioning in Schizophrenia: A Comparison with Conventional Domain Scores

Chen, C.

2026-06-29 psychiatry and clinical psychology 10.64898/2026.06.25.26356524 medRxiv

Top 0.1%

1.6%

Show abstract

Predicting real-world functional outcomes in schizophrenia (SCZ) remains a clinical priority, but existing models are limited by methodological constraints and a lack of established clinical utility. Cognition is a commonly used predictor, and the Normative Latent Cognitive Structure (N-LCS) approach provides a structure-informed representation that may address limitations of conventional domain-level scores. Data from two merged COBRE cohorts (163 SCZ, 180 healthy controls) were used to develop ridge regression models for economic (EF), occupational (OF), and social (SF) functioning, using N-LCS deviation metrics alongside a priori selected demographic and clinical predictors. Score-based models using MCCB domain T-scores were developed for comparison. Performance was evaluated using bootstrap-corrected AUC, balanced accuracy, and calibration for binary outcomes, and weighted kappa and log-loss for SF. Decision curve analysis (DCA) was used to assess clinical utility for the binary outcomes. The EF model achieved a corrected AUC of 0.76 and balanced accuracy of 0.73. The OF model achieved 0.72 and 0.71, respectively. The SF model showed modest performance (weighted kappa = 0.33). DCA indicated net benefit across the full threshold range for EF and above 0.37 for OF. N-LCS models demonstrated comparable or modestly superior performance to score-based models while using fewer predictors and showing better calibration for EF. These findings support the predictive utility of N-LCS for functional outcomes in SCZ and underscore the need for external validation in independent cohorts as a next step toward clinical application.

12

Beyond the Sentence: Clinical and Social Determinants of Forensic Hospitalization Duration in Northern Israel

Kovalenko, I.; Simonov, S.; Shamir, A.; Sharony, L.

2026-06-29 psychiatry and clinical psychology 10.64898/2026.06.25.26356525 medRxiv

Top 0.1%

1.4%

Show abstract

Purpose: Involuntary psychiatric hospitalization under court orders requires careful balancing of legal obligations and clinical needs. Identifying factors that influence the length of these hospital stays helps clarify the relationship between legal frameworks and psychiatric treatment. This study aims to describe the socio-demographic, clinical, and legal profiles of individuals hospitalized under court warrants and to identify factors independently associated with the duration of forensic hospitalization. Methods: A retrospective study was conducted on 119 patients discharged between 2018 and 2023. Data were collected from medical and legal records, including socio-demographic details, psychiatric diagnoses, offense types, hospital stay lengths, and legal proceedings. Results: Most patients were men (91.6%) diagnosed with schizophrenia or schizoaffective disorder (97.5%), with high rates of comorbid substance use disorder (79.0%) and unemployment (85.7%). The median hospital stay was 19.0 months, representing 40% of the maximum statutory sentence. Patients with low-severity offenses served a larger share of their maximum sentence (47%) than those with high-severity offenses (24%). Time to first discretionary leave was the strongest predictor of total stay duration in univariable analysis. Conclusion: The finding that patients with minor offenses have longer hospital stays than those with serious offenses confirms that clinical factors, rather than offense severity, primarily influence discharge decisions. These findings support moving toward personalized, clinically focused, and family-inclusive forensic discharge planning while maintaining public safety.

13

Polygenic associations with phenotypic classes across the psychosis-affective spectrum

Dennison, C. A.; Legge, S. E.; Cardno, A. G.; Quattrone, D.; Holmans, P.; Di Florio, A.; Gordon-Smith, K.; Jones, I.; Jones, L.; Owen, M. J.; O'Donovan, M.; Walters, J. T.

2026-07-14 psychiatry and clinical psychology 10.64898/2026.07.10.26357470 medRxiv

Top 0.1%

1.3%

Show abstract

Introduction Limitations of current classifications of schizophrenia, schizoaffective disorder, and bipolar disorder are evident from their overlapping symptoms, aetiologies, treatments, and outcomes, and present a barrier to novel treatment discovery. Alternative conceptualisations are needed to address nosological validity, align diagnosis to aetiology, and improve prognostication and treatment choice. We aimed to identify latent classes across the psychosis spectrum based on premorbid functioning and outcomes, and assess these in relation to genetic liability and symptom dimensions. Method Participants with a diagnosis of schizophrenia, schizoaffective disorder, or bipolar disorder type 1, were ascertained from four UK clinical cohorts (total n=5,043). Latent class analysis was conducted using phenotypes not included within the diagnostic criteria, including premorbid functioning, age at illness onset, and measures of severity and course. Polygenic scores (PGS) for psychiatric disorders and behavioural traits were tested for associations with latent classes. We tested if diagnosis explained associations between PGS and classes. Results A three-class model provided the best fit. Class one had poorer premorbid functioning, lower rates of recovery, and higher PGS for schizophrenia and ADHD. Class three had the highest functioning, higher rates of psychosocial stressors before onset, higher intelligence PGS and lower PGS for psychiatric disorders. Class two was intermediate between classes one and three on measures of functioning, but was characterised by high levels of involuntary hospital admissions and high bipolar disorder PGS. Diagnosis only partially explained associations between PGS and class membership. Conclusions We identified classes across the psychosis spectrum characterised by different premorbid functioning and outcomes, that cut across diagnostic categories and captured genetic liability not explained by diagnosis. Our findings suggest alternative conceptualisations of psychotic disorders may complement diagnoses in mapping to the aetiology of these conditions, and could be useful to advance precision psychiatry.

14

Role-Prompting in Frontier Large Language Models Influences Clinical Reasoning in Complex Medical Cases

Dave, C.; Diviero, A.; Dassanayake, T.; Alshahrani, S. J.; Al Mardini, A.; Khadir, W.; Patel, A. D.; Srivastava, A.

2026-07-01 medical ethics 10.64898/2026.06.29.26356864 medRxiv

Top 0.1%

1.1%

Show abstract

Background: Large language models (LLMs) are increasingly deployed in healthcare, where they may adopt different stakeholder perspectives, yet the effect of role-prompting on clinical ethical reasoning remains poorly characterized. Methods: We evaluated three frontier LLMs: Claude Opus 4.6, GPT-5.4, and Gemini 3.1 Pro across 25 ethically complex medical cases. Each model responded from three stakeholder perspectives (physician, patient, insurer) across three independent runs (675 total responses). Decisions were benchmarked against a six-physician panel. Ethical value prioritization was analyzed from physician- and LLM-provided ranked values. A Patient-Centric Decision Index (PCDI) was developed to quantify LLM decision alignment with patient-preferred out-comes. Results: Among 20 cases with clear physician consensus, LLMs prompted as an insurer reduced alignment with physician majority by 50% for GPT-5.4 (p = 0.004), 45% for Gemini 3.1 Pro (p = 0.008) and 10.5% (NS) for Opus 4.6. The insurer role shifted primary ethical values from beneficence (27%) to financial stewardship (20%) across all LLMs. Conclusions: Stakeholder role-prompting fundamentally alters clinical decisions and ethical value frameworks of frontier LLMs, with the insurer role producing systematic denial of physician-endorsed, patient-preferred treatments. These findings raise the need for standardized LLM patient-centricity benchmarks, and physician oversight when LLMs are deployed in clinical decision-making.

15

Co-development of anxiety and depression in UK and Brazil youth; a cross-country comparison

Shakeshaft, A.; Barrass, L.; Farooq, B.; Riglin, L.; Goncalves Soares, A. L.; Jones, H. J.; Lidbetter, N.; Knipe, D. J.; Penton-Voak, I.; Carpena, M. X.; dos Santos, I. S.; Tovo-Rodrigues, L.; Heron, J.; Rice, F.; Matijasevich, A.; Howe, L. D.

2026-06-24 psychiatry and clinical psychology 10.64898/2026.06.22.26356231 medRxiv

Top 0.1%

1.1%

Show abstract

Importance Anxiety and depression frequently co occur and show developmentally patterned co-development from childhood to adolescence. Adult psychiatric outcomes vary according to the timing, sequencing, and persistence of early symptoms, yet it remains unclear whether patterns of co development are comparable across high income and low and middle income country contexts. Objective Examine joint developmental trajectories of anxiety and depression from childhood to adolescence and their associations with anxiety and depression diagnoses in young adulthood. Design, Setting and Participants Population based prospective cohort studies in the UK (Avon Longitudinal Study of Parents and Children [ALSPAC], N=9,586) and Brazil (Pelotas 2004 Birth Cohort, N=3,815). Main Outcomes and Measures Trajectories were derived using parallel process latent growth models and latent class growth analyses of anxiety and depression using the Development and Well Being Assessment at early childhood (6-7 years), middle childhood (10-11 years), and adolescence (13-15 years). Diagnoses of anxiety and depression at 18 years were assessed via the Clinical Interview Schedule (ALSPAC) and the Mini International Neuropsychiatric Interview (Pelotas). Results Prevalence of anxiety and depression from early childhood to adolescence was similar across cohorts. Co-development was stronger in ALSPAC, with modest increases in both conditions, whereas in Pelotas, anxiety increased rapidly while depression showed little average change. In both cohorts, four trajectory classes were identified: stable-low (ALSPAC, 41%; Pelotas, 54%), increasing (31%; 28%), decreasing (23%; 15%), and persistent-high anxiety/increasing depression (5%; 3%). Compared with the stable-low class, youth in the increasing and persistent-high classes had elevated odds of depression (ALSPAC: OR=2.0 [95% CI, 1.4-2.8] and 4.2 [2.6-6.7]; Pelotas: 2.2 [1.5-3.3] and 2.9 [1.4-6.0]) and anxiety in young adulthood (ALSPAC: 1.6 [1.2-2.2] and 4.8 [3.2-7.0]; Pelotas: 1.7 [1.2-2.6] and 2.9 [1.5-5.8]). No increased risk was observed in the decreasing class. Conclusions and Relevance Patterns of anxiety and depression co development were comparable across the UK and Brazil, suggesting shared developmental pathways. However, more rapid increases in anxiety among Brazilian youth may reflect context specific risk factors. Persistence or emergence beyond early childhood was critical for identifying later diagnostic risk in both settings, highlighting the importance of early monitoring and intervention.

16

Systematic Review and Meta-Analysis: Do Youth-Reported Psychosis Symptoms Predict Later Mental Health Diagnosis?

Shah, J. N.; Ameis, S. H.; Donato, C. A.; Wei, I.; Dabagh, Y. A.; Cleverley, K.; Courtney, D. B.; Foussias, G.; Kozloff, N.; Voineskos, A. N.; Wang, W.; Dickie, E. W.

2026-07-15 psychiatry and clinical psychology 10.64898/2026.07.13.26357957 medRxiv

Top 0.1%

1.1%

Show abstract

Objective Psychosis spectrum symptoms (PSS) are common among children and youth. These symptoms may be clinically significant as studies indicate a heightened risk of mental health disorders, in general, as well as psychotic disorders, specifically, in youth that endorse PSS. This systematic review and meta-analysis investigates the longitudinal association between PSS in children and youth and subsequent mental health diagnosis. Methods A comprehensive search of Ovid Medline, PsycINFO, and EMBASE databases was conducted to identify longitudinal studies that: (i) assess PSS at a baseline timepoint, (ii) in individuals under 25 years, and (iii) assess mental health disorder diagnosis using a structured assessment at a later time point in the same sample. We conducted a meta-analysis and calculated pooled odds ratios (ORs) for mental health and psychotic disorders using random-effects models. Post-hoc meta-regressions were performed to examine the influence of a number of moderators on the relationship between earlier recorded PSS and subsequent mental health disorders or psychotic disorders. Results The search yielded 41 eligible studies of which 25 were included in the meta-analysis. Most included studies assessed PSS using brief self-report measures and recruited their samples from clinical or community settings. Among children and youth without an identified mental health diagnosis at baseline assessment, baseline PSS were associated with a 2-fold (OR = 2.07, CI = 1.61 - 2.66, I2 = 86.92%, p < 0.0001) increased risk of meeting diagnostic criteria for subsequent mental health disorder diagnosis and a 3-fold increased risk (OR = 3.11, CI = 2.11 - 4.58, (I2 = 60.93%, p < 0.0090) of meeting diagnostic criteria for a subsequent psychotic disorder diagnosis with a minimum 1 year follow-up time from baseline assessment. Meta-regression analysis indicated that study quality and sample size explained a substantial proportion of between-study heterogeneity for psychotic disorder outcomes. Conclusions Our results suggest that administration of simple self-report measures of PSS in both clinical and community settings may be helpful to identify children and youth at higher risk of subsequently meeting criteria for a mental disorder generally, and for a severe mental illness (i.e., psychotic disorder), specifically. Future longitudinal studies should focus on improving study design characteristics to increase confidence in identified longitudinal associations. The results of our work suggests that integration of self-report measures of PSS may be useful in a variety of settings to identify youth at increased risk of subsequent mental illness.

17

Childhood emotional symptom trajectories in three generationally and socio-ethnically distinct UK birth cohorts

Fairweather, S. J.; Kwong, A. S. F.; Deniz, E.; Hammerton, G.; Khandaker, G. M.; Jones, H. J.

2026-07-01 epidemiology 10.64898/2026.06.24.26356453 medRxiv

Top 0.2%

0.8%

Show abstract

Background: Depression and anxiety symptoms emerge early in life. We examined developmental trajectories of emotional symptoms, starting from early childhood, in three UK birth-cohorts spanning successive generations and diverse socio-ethnic contexts. Methods: Using data from three longitudinal, population-based UK birth-cohorts: Avon Longitudinal Study of Parents and Children (ALSPAC), Millenium Cohort Study (MCS), and Born in Bradford (BiB) we identified group-based trajectories of emotional symptoms using repeated Strengths and Difficulties Questionnaire, Emotional Subscale (SDQ-E) scores from ages 3-14y. Baseline samples comprised children with [≥]1 SDQ-E measure between age 3-14y (NALSPAC=11,025; NMCS=15,446; NBiB=6711). Participants were born three decades apart (ALSPAC: 1990-2, MCS: 2000-2, BiB: 2007-10) in distinct socioeconomic and ethnic contexts. We characterised group membership by: female sex, non-white ethnicity, maternal depression/anxiety and IMD quintile. In ALSPAC we modelled associations between trajectories and depression/anxiety diagnoses in early adulthood (24y and 30y). Results: In all cohorts 49% were female. ALSPAC had few non-white participants (4%) compared to MCS (17%) and BiB (66%). Each cohort had low-, mid- and high-level symptom trajectories. High-level trajectories comprised 6-7% of the population in each cohort. However, in younger cohorts, high-level symptom trajectories started high and persisted from age 3-5y but started low and increased in the oldest cohort. Female sex and maternal depression/anxiety were associated with higher odds of high-level or increasing symptom trajectories across all cohorts. Higher socioeconomic status and belonging to the ethnic majority was protective. Mid- and high-level symptom trajectories had higher odds of depression/anxiety diagnoses in early-adulthood in the older ALSPAC cohort. Conclusions: Developmental trajectories of emotional symptoms across childhood and adolescence are broadly similar across generations and diverse social contexts. However, children born more recently and in more diverse contexts may experience more persistent, severe emotional symptoms from a young age Key words: Longitudinal trajectories; emotional symptoms; SDQ, ALSPAC; MCS; Born in Bradford

18

Trends and variations in Lithium usage across care settings in England between 2015-2024

Schiffer, H.; Fisher, L.; Curtis, H. J.; Wood, C.; Brown, A. D.; Bacon, S. C.; Croker, R.; Goldacre, B.; MacKenna, B.; Speed, V.; Macdonald, O.

2026-07-17 psychiatry and clinical psychology 10.64898/2026.07.15.26357641 medRxiv

Top 0.2%

0.8%

Show abstract

Lithium has been the gold standard for the treatment and prevention of relapse in bipolar disorder for over 60 years. Guidance from the National Institute for Health and Clinical Excellence states explicitly to 'offer lithium as a first-line, long-term pharmacological treatment for bipolar disorder'. Yet, in the last two decades its use has been in decline with clinicians favouring anticonvulsants or antipsychotics when treating this condition. In this study, we have used three openly available datasets containing prescribing data from primary and secondary care to explore trends in the use of lithium in England, showing both regional and temporal variance between 2015-2024. We have shown that lithium use declined in primary care by 20.9% in the last ten years (2015-2024) and 10.9% overall in the last five years (2019 to 2025). We have also shown how there is some regional variation in the source of lithium for patients, although the vast majority is prescribed in primary care. Further research into clinical behaviour is needed to understand what is driving the decrease in lithium usage, and what barriers and enablers may influence its use across the country.

19

First-Trimester Non-Invasive Prediction of Preterm Birth Using Cell-Free DNA Fragmentomics

Pham, M.-D. N.; Phan, M.-T. T.; Tran, N.-T.; Vo, T.-S.; Le, H.-T.; Nguyen, T.-H. T.; Nguyen, Q.-H. V.; Ha, M.-T. T.; Le, T. M.; Hoang, D.-T. T.; Huynh, K.-T. N.; Nguyen, N. V.; Nguyen, C. C.; Bui, T. C.; Nguyen, X. T.; Le, S. V.; Tran, V. D.; Nguyen, M.-N. B.; Nguyen, T. V.; Nguyen, T.-A. T.; Hoang, B. P.; Nguyen, T. V.; Nguyen, T.-A. T.; Nguyen, T. T.; Duong, T. D.; Pham, C. H.; Luong, K.-O. T.; Dao, C. N.; Hoang, K. V.; Huynh, T.-T. T.; Nguyen, K. M.; Tran, S.-T. T.; Tran, H. T.; Nguyen, S. C.; Tran, T. D.; Nguyen, P. T. L.; Pham, T. V.; Pham, K. C.; Thai, M. D.; Do, T.-T. T.; Dao, H. T.; Va

2026-07-11 genomics 10.64898/2026.07.07.736241 medRxiv

Top 0.2%

0.6%

Show abstract

ObjectiveTo develop and validate a cell-free DNA (cfDNA) fragmentomic classifier for the early prediction of spontaneous preterm birth (PTB) using routine first-trimester non-invasive prenatal testing (NIPT) data. MethodsA nested case-control study was conducted within a prospective multicenter Vietnamese cohort comprising 286 pregnancies, including 82 spontaneous PTB cases and 204 term controls. Maternal plasma cfDNA collected during routine first-trimester NIPT (median gestational age, 12 weeks) was sequenced to a depth of approximately 20 million reads per sample. Five fragmentomic feature categories including copy number alterations, end-motif composition, nucleosome distance, fragment length, and joint fragment-lengthxend-motif were evaluated for PTB prediction. Machine learning classifiers were developed in a training cohort (n = 228, 65 PTB vs 163TB) and tested in a validation cohort (n = 58, 17 PTB vs 41 TB). ResultsAmong the five fragmentomic feature classes evaluated, 4-mer end-motif (EM) profiles exhibited the most pronounced differences between PTB and term control samples. Consistent with these findings, the EM-based classifier demonstrated the highest discriminative performance in the validation cohort, achieving an AUC of 0.970 (95% CI, 0.912-1.000). At a specificity >90%, the model achieved a sensitivity of 94% (95% CI, 78-100%). ConclusionThese findings demonstrate that cfDNA EM signatures derived from routine first-trimester NIPT can accurately identify pregnancies at risk of spontaneous preterm birth, without additional blood collection or sequencing, thereby extending the clinical utility of existing prenatal screening infrastructure. KEY POINTSO_ST_ABSWhat is already known about this topic?C_ST_ABSO_LICurrent first-trimester prediction strategies based on maternal characteristics, cervical length, and biochemical markers have limited predictive accuracy, particularly in nulliparous women. C_LIO_LIExisting cfDNA-based approaches have shown only modest performance or require additional assays, limiting clinical applicability. C_LI What does this study add?O_LIExisting NIPT sequencing data can be repurposed (without additional blood sampling or sequencing) for accurate prediction of spontaneous preterm birth (AUC=0.970). C_LIO_LIA classifier employing 4-mer end-motif (EM) profiles achieved an AUC of 0.970. At a specificity >90%, the model achieved a sensitivity of 94%. C_LI

20

Fairness-aware, explainable clinical decision support for opioid use disorder risk stratification: development and internal validation of a dual-layer AI system

Kazgan, M.; Mohammadvand, N.; Cetin, B.

2026-06-26 pain medicine 10.64898/2026.06.23.26356401 medRxiv

Top 0.2%

0.6%

Show abstract

Opioid use disorder (OUD) remains a leading cause of preventable death in the United States, yet the tools used to assess OUD risk rely on episodic self-report, produce binary output, and exhibit documented performance disparities across demographic groups that can widen existing inequities in care. Machine-learning models for OUD risk are seldom evaluated for demographic fairness or designed for the transparency clinicians need to trust and act on them. We present a fairness-aware, explainable clinical decision support system for four-tier OUD risk stratification, developed and internally validated on a large electronic health record-derived cohort accessed via Mayo Clinic Platform_Discover. The system pairs an XGBoost classifier with a transparent Clinical Rules Engine that attributes risk across six clinical domains, providing clinician-interpretable explanations alongside each prediction. To address demographic disparity directly, we applied an iterative bias-mitigation strategy combining age-balanced resampling, removal of race as a model input, and cost-sensitive reweighting, and measured its effect using group-fairness metrics (demographic parity, equal opportunity, equalized odds, and calibration within groups). On a held-out internal test set, mitigation reduced the White-Black gap in high-risk detection from 30.3 to 7.4 percentage points (a 76% relative reduction) and the age-based accuracy gap from 6.6 to 2.7 percentage points (59% reduction), raising high-risk detection for Black patients from 58.3% to 75.0%, at a cost of fewer than two percentage points of overall accuracy; gender differences remained below three points. The system was independently qualified through Mayo Clinic Platform_Solutions Studio. This work offers an implementable, transparent blueprint for operationalizing fairness and explainability in clinical AI for high-risk prescribing, with external and prospective validation as the clear next steps.